Statistics for Next Generation Sequencing – Meeting Report
نویسندگان
چکیده
AnAlysis of RnA-seq dAtA Profiling the transcriptome has been a central application of NGS technologies. Since the sequencing technology generates short reads, the first step is to map the reads onto the source genome, genes, and transcripts. Despite development of many algorithms and tools for mapping reads to the reference genomes, accurately mapping RNA-seq reads remains a tough problem due to the complexity of the transcriptome. Thomas Wu from Genentech presented his recent work on mapping RNA-seq sequence reads and gene structure analysis from RNA-seq data in their Genomic Short-read Nucleotide Alignment Program (GSNAP) software package. They use probabilistic models and known splicing information to guide alignment and variant/ splicing detection with consideration of all possible combinations of major and minor alleles. Wu also presented GSTRUCT, a pipeline for assembling alignment results to gene structures and predicting isoforms and gene fusion events. Unlike microarray data, sequencing data were originally thought to be digital and without need for intensive normalization. However, Zhijin Wu from Brown University demonstrated that gene expression data from RNA-seq could suffer from strong biases related to transcript length and GC content. She showed that the biases can be removed by a carefully formulated statistical model which combines a generalized linear model with a quantile normalization procedure. The clustering of RNA-seq data often relies on heuristic methods used widely in microarray data analysis. Peng Liu from Iowa State University demonstrated that a mixture of negative binomial models for clustering the RNA-seq data together with an Expectation-Maximization (EM) algorithm to obtain the model parameters can improve the performance substantially (R software package MBCluster.Seq in CRAN). In addition to the methodologically focused talks, Ali Mortazavi from UC Irvine presented preliminary results regarding RNA editing events from RNA-seq data generated by the ENCODE project. Mortazavi showed the existence of additional RNADNA sequence differences besides SNPs and canonical RNA editing events. He hypothesized that these additional sequence differences are the results of unknown technical artifacts unlike the conclusion from the work of Li et al. (2011).
منابع مشابه
Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project
Between July 18(th) and 24(th) 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled "Terabase Metagenomics" and was sponsored by the Institute for Computing in Science (ICiS) s...
متن کاملStatistical Analysis for Next Generation Sequencing----Meeting Report
The boom of next generation sequencing (NGS) technology and its applications to a wide range of biomedical fields has brought about many computational and statistical challenges. The NHGRI funded small conference “Statistical Analysis for Next Generation Sequencing” was held September 26 -27 2011 in Birmingham, AL, to discuss these statistical challenges and strategies to tackle them. A total o...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملGenome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review
Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...
متن کاملI-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies
The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کامل